Adaptive sparse matrix representation for efficient matrix–vector multiplication

Zardoshti, Pantea; Khunjush, Farshad; Sarbazi-Azad, Hamid

doi:10.1007/s11227-015-1571-0

Adaptive sparse matrix representation for efficient matrix–vector multiplication

Published: 28 November 2015

Volume 72, pages 3366–3386, (2016)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Pantea Zardoshti ORCID: orcid.org/0000-0003-1004-6398^1,2,
Farshad Khunjush² &
Hamid Sarbazi-Azad^1,3

597 Accesses
10 Citations
Explore all metrics

Abstract

A wide range of applications in engineering and scientific computing are based on the sparse matrix computation. There exist a variety of data representations to keep the non-zero elements in sparse matrices, and each representation favors some matrices while not working well for some others. The existing studies tend to process all types of applications, e.g., the most popular application which is matrix–vector multiplication, with different sparse matrix structures using a fixed representation. While Graphics Processing Units (GPUs) have evolved into a very attractive platform for general purpose computations, most of the existing works on sparse matrix–vector multiplication (SpMV, for short) consider CPUs. In this work, we design and implement an adaptive GPU-based SpMV scheme that selects the best format for the input matrix having the configuration and characteristics of GPUs in mind. We study the effect of various parameters and different settings on the performance of SpMV applications when employing different data representations. We then employ an adaptive scheme to execute different sparse matrix applications using proper sparse matrix representation formats. Evaluation results show that our run-time adaptive scheme properly adapts to different applications by selecting an appropriate representation for each input sparse matrix. The preliminary results show that our adaptive scheme improves the performance of sparse matrix multiplications by 2.1\(\times \) for single-precision and 1.6\(\times \) for double-precision formats, on average.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance improvement of the triangular matrix product in commodity clusters

Article Open access 15 April 2024

A class of accelerated GADMM-based method for multi-block nonconvex optimization problems

Article 13 April 2024

Parallelizing the dual revised simplex method

Article Open access 14 December 2017

References

Asanovic K et al (2006) The landscape of parallel computing research: a view from Berkeley, vol 2. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley
Page L, Brin S, Motwani R, Winograd T (1998) The PageRank citation ranking: bringing order to the web. Technical report, Stanford Digital Library Technologies Project
Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. SIAM, Philadelphia
Bell N, Garland M (2009) Implementing sparse matrix–vector multiplication on throughput-oriented processors. In: Proceedings of the conference on high performance computing networking, storage and analysis. ACM
Baskaran MM, Bordawekar R (2008) Optimizing sparse matrix–vector multiplication on GPUs using compile-time and run-time strategies. IBM Research Report, RC24704 (W0812-047)
Baskaran MM, Bordawekar R (2009) Sparse matrix–vector multiplication toolkit for graphics processing units. http://www.alphaworks.ibm.com/tech/spmv4gpu
Monakov A, Lokhmotov A, Avetisyan A (2010) Automatically tuning sparse matrix–vector multiplication for GPU architectures. In: High performance embedded architectures and compilers. Springer, Berlin
Monakov A (May 2012) Specialized sparse matrix formats and SpMV kernel tuning for GPUs. In: Proceedings of the GPU technology conference (GTC)
Choi JW, Singh A, Vuduc RW (2010) Model-driven autotuning of sparse matrix–vector multiply on GPUs. In: ACM sigplan notices, vol 45, no 5. ACM
Grewe D, Lokhmotov A (2011) Automatically generating and tuning GPU code for sparse matrix–vector multiplication from a high-level representation. In: Proceedings of the fourth workshop on general purpose processing on graphics processing units. ACM
Reguly I, Giles M (2012) Efficient sparse matrix–vector multiplication on cache-based GPUs. In: Innovative parallel computing (InPar), 2012. IEEE
Vázquez F, Fernández JJ, Garzón EM (2011) A new approach for sparse matrix vector product on NVIDIA GPUs. Concurr. Comput. Pract. Exp. 23(8):815–826
Article Google Scholar
Yan S et al (2014) yaspmv: Yet another SpMV framework on GPUs. In: ACM SIGPLAN notices, vol 49, no 8. ACM
Ashari A et al (2014) An efficient two-dimensional blocking strategy for sparse matrix–vector multiplication on GPUs. In: Proceedings of the 28th ACM international conference on supercomputing. ACM
Zheng C et al (2014) BiELL: a bisection ELLPACK-based storage format for optimizing SpMV on GPUs. J Parallel Distrib Comput 74(7):2639–2647
Article Google Scholar
Yan CC et al (2015) Memory bandwidth optimization of SpMV on GPGPUs. Front Comput Sci 9(3):431–441
Article Google Scholar
Guo P, Wang L (2014) Accurate cross-architecture performance modeling for sparse matrix-vector multiplication (SpMV) on GPUs. Concurr Comput Pract Exp 27(13):3281–3294. doi:10.1002/cpe.3217
Article Google Scholar
Li K, Yang W, Li K (2015) Performance analysis and optimization for SpMV on GPU using probabilistic modeling. IEEE Trans Parallel Distrib Syst 26(1):196–205
Article Google Scholar
Neelima B, Ram Mohana Reddy G, Raghavendra Prakash S (2014) Predicting an Optimal Sparse Matrix Format for SpMV Computation on GPU. In: Parallel and distributed processing symposium workshops (IPDPSW), 2014 IEEE international. IEEE
Sedaghati N, Mu T, Pouchet L-N, Parthasarathy S, Sadayappan P (2015) Automatic selection of sparse matrix representation on GPUs. In: Proceedings of the 29th ACM international conference on supercomputing. ACM
Vuduc RW (2003) Automatic performance tuning of sparse matrix kernels, Diss. University of California, Berkeley
Williams S et al (2009) Optimization of sparse matrix–vector multiplication on emerging multicore platforms. Parallel Comput 35(3):178–194
Article Google Scholar
Williams S (2008) Webb. Auto-tuning performance on multicore computers, ProQuest
Vuduc RW, Demmel JW, Yelick KA (2005) OSKI: a library of automatically tuned sparse matrix kernels. In: Journal of Physics: conference series, vol 16, no 1. IOP Publishing
Bilmes J et al (1997) Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In: Proceedings of the 11th international conference on supercomputing. ACM
Whaley RC, Dongarra JJ (1998) Automatically tuned linear algebra software. In: Proceedings of the 1998 ACM/IEEE conference on supercomputing. IEEE Computer Society
Im EJ, Yelick KA, Vuduc RW (2004) Sparsity: optimization framework for sparse matrix kernels. Int J High Perform Comput Appl 18(1):135–158
Article Google Scholar
Li J et al (2013) SMAT: an input adaptive auto-tuner for sparse matrix–vector multiplication. In: ACM SIGPLAN notices, vol 48, no 6. ACM
Vuduc RW, Moon HJ (2005) Fast sparse matrix–vector multiplication by exploiting variable block structure. In: High performance computing and communications. Springer, Berlin
Ogielski AT, Aiello W (1993) Sparse matrix computations on parallel processor arrays. SIAM J Sci Comput 14(3):519–530
Article MathSciNet MATH Google Scholar
Lee BC et al (2004) Performance models for evaluation and automatic tuning of symmetric sparse matrix–vector multiply. In: International conference on parallel processing, 2004. ICPP 2004. IEEE
Im EJ, Yelick KA (2000) Optimizing the performance of sparse matrix–vector multiplication. University of California, Berkeley
Google Scholar
Kourtis K et al (2011) CSX: an extended compression format for spmv on shared memory systems. In: ACM SIGPLAN notices, vol 46, no 8. ACM
Liu W, Vinter B (2015) Csr5: an efficient storage format for cross-platform sparse matrix–vector multiplication. In: Proceedings of the 29th ACM international conference on supercomputing. ACM
NVIDA (2014) Whitepaper NVIDIAs next generation CUDA compute architecture: Kepler GK110/210
NVIDIA Corporation (2014) Tuning CUDA applications for Kepler. Technical report, August 2014. http://docs.nvidia.com/cuda/pdf/Kepler_Tuning_Guide.pdf
NVIDIA CUDA (2010) NVIDIA CUDA C programming guide, Version 3.1. http://developer.download.nvidia.com/compute/cuda/3_1/toolkit/docs/NVIDIA_CUDA_C_ProgrammingGuide_3.1.pdf. Accessed 4 May 2011
Davis TA, Hu Y (2011) The University of Florida sparse matrix collection. ACM Trans Math Softw (TOMS) 38(1):1
MathSciNet Google Scholar
NVIDIA (2013) Compute visual profiler user guide. http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/Compute-Visual-Profiler-User-Guide.Pdf

Download references

Author information

Authors and Affiliations

School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
Pantea Zardoshti & Hamid Sarbazi-Azad
Department of Computer Science and Engineering, School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran
Pantea Zardoshti & Farshad Khunjush
Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
Hamid Sarbazi-Azad

Authors

Pantea Zardoshti
View author publications
You can also search for this author in PubMed Google Scholar
Farshad Khunjush
View author publications
You can also search for this author in PubMed Google Scholar
Hamid Sarbazi-Azad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pantea Zardoshti.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2309 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zardoshti, P., Khunjush, F. & Sarbazi-Azad, H. Adaptive sparse matrix representation for efficient matrix–vector multiplication. J Supercomput 72, 3366–3386 (2016). https://doi.org/10.1007/s11227-015-1571-0

Download citation

Published: 28 November 2015
Issue Date: September 2016
DOI: https://doi.org/10.1007/s11227-015-1571-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive sparse matrix representation for efficient matrix–vector multiplication

Abstract

Access this article

Similar content being viewed by others

Performance improvement of the triangular matrix product in commodity clusters

A class of accelerated GADMM-based method for multi-block nonconvex optimization problems

Parallelizing the dual revised simplex method

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 2309 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Adaptive sparse matrix representation for efficient matrix–vector multiplication

Abstract

Access this article

Similar content being viewed by others

Performance improvement of the triangular matrix product in commodity clusters

A class of accelerated GADMM-based method for multi-block nonconvex optimization problems

Parallelizing the dual revised simplex method

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 2309 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation